{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In [Quick Start](https://github.com/DataCanvasIO/HyperTS/blob/main/examples/02_quick_start.ipynb), we learned the basics of HyperTS modeling:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from hyperts import make_experiment\n",
    "from hyperts.datasets import load_network_traffic\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "df = load_network_traffic()\n",
    "train_data, test_data = train_test_split(df, test_size=168, shuffle=False)\n",
    "\n",
    "experiment = make_experiment(train_data,\n",
    "                            task='forecast',\n",
    "                            timestamp='TimeStamp',\n",
    "                            covariables=['HourSin', 'WeekCos', 'CBWD'])\n",
    "model = experiment.run()\n",
    "\n",
    "X_test, y_test = model.split_X_y(test_data)\n",
    "forecast = model.predict(X_test)\n",
    "scores = model.evaluate(y_true=y_test, y_pred=forecast)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To get a better understanding of HyperTS, this NoteBook will go through the ```make_experiment``` tutorial in more detail, so you can explore more robust performance."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. Create an experiment by default.\n",
    "2. Select operation mode.\n",
    "3. Specify the evaluation metrics.\n",
    "4. Specify optimization direction.\n",
    "5. Set the maximum search trials.\n",
    "6. Set early stoping.\n",
    "7. Specify the validation data set.\n",
    "8. Specify search algorithm.\n",
    "9. Specify time frequency.\n",
    "10. Specify forecast window.\n",
    "11. Fixed random seed.\n",
    "12. Adjusting a log level.\n",
    "13. Discrete time series forecasting.\n",
    "14. Forecasting without timestamp column.\n",
    "15. Forecasting train data cut off.\n",
    "16. Set cross validation.\n",
    "17. Ensemble models."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1. Create an experiment by default."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we must tell the experiment what type of task we are going to do, that is, assign a value to the parameter ```task```. \n",
    "\n",
    "Second, in the forecasting task, we must pass the column name of the parameter ```timestamp``` to ```make_experiment```. \n",
    "If there are covariates, we also need to pass in the column names of ```covariables```(or ``covariates``). \n",
    "\n",
    "Thrid, in the classification task, if the target column of the data is not y or target, the parameter ```target``` needs to be set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Forecast\n",
    "experiment = make_experiment(train_data, \n",
    "                             task='forecast',\n",
    "                             timestamp='TimeStamp',\n",
    "                             covariates=['HourSin', 'WeekCos', 'CBWD'])\n",
    "# Classification\n",
    "experiment = make_experiment(train_data, task='classification', target='y')  \n",
    "\n",
    "# Regression\n",
    "experiment = make_experiment(train_data, task='regression', target='y')  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note**\n",
    "\n",
    "For the time series forecasting task, it may be divided into *univariate forecasting* and *multivariate forecasting* according to the number of the predicted variables. For the time series classification task, data can be divided into *univariate binary classification*, *univariate multiclassification*, *multivariate binary classification* and *multivariate multiclassification* according to the number and classes of feature nad target variables. If we already know the basic situation of the data and the task to be solved after getting the data, it is recommended to pass the following parameters in the configuration task: \n",
    "\n",
    "- multivariate forecasting: task='unvariate-forecast';\n",
    "- multivariate forecasting: task='multivariate-forecast';\n",
    "- univariate binary classification: task='univariate-binaryclass';\n",
    "- univariate multiclassification: task='univariate-multiclass';\n",
    "- multivariate binary classification: task='multivariate-binaryclass';:\n",
    "- multivariate multiclassification: task='multivariate-multiclass'.\n",
    "\n",
    "Of course, we can also simply configure ```task='forecast'```, ```task='classification'``` and ```task='regression'```. So that, HyperTS will perform detailed task type inference from the data combined with other known column information.\n",
    "\n",
    "--------------"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2. Select operation mode."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "HyperTS has three built-in modeling modes, namely statistical model mode ('stats'), deep learning mode ('dl') and neural architecture search mode ('nas', not open). By default, the statistical model mode is selected by default, but you can change to other modes as well:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = make_experiment(train_data, \n",
    "                             mode='dl',\n",
    "                             ...)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The deep learning mode based on Tensorflow, can support GPU. By default, the default experiment will run in the CPU environment. If your device supports GPU and installed the gpu version of tensorflow-gpu, you can change the parameter ```tf_gpu_usage_strategy```:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = make_experiment(train_data, \n",
    "                             mode='dl',\n",
    "                             tf_gpu_usage_strategy=1,\n",
    "                             ...)  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "where, ```tf_gpu_usage_strategy``` supports three configuration strategies:\n",
    "\n",
    "- 0: CPU;\n",
    "- 1: GPU memory growth;\n",
    "- 2: GPU memory limit, the default is 2048M, and the parameter ```tf_memory_limit``` supports custom configuration."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3. Specify the evaluation metrics."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default, the model evaluation metric is 'mae' for prediction tasks, 'accuracy' for classification tasks, and 'rmse' for regression tasks. We can reset the evaluation metric through the parameter ```reward_metric```, which can be 'str' or a built-in function of ```sklearn.metrics```, for example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# str\n",
    "experiment = make_experiment(train_data, \n",
    "                             task='univariate-binaryclass',\n",
    "                             reward_metric='auc',\n",
    "                             ...)  \n",
    "\n",
    "# sklearn.metrics\n",
    "from sklearn.metrics import auc\n",
    "experiment = make_experiment(train_data, \n",
    "                             task='univariate-binaryclass',\n",
    "                             reward_metric=auc,\n",
    "                             ...) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Currently, ```reward_metric``` can support a variety of evaluation metrics, as follows:\n",
    "\n",
    "- Classification: accuracy, auc, f1, precision, recall, logloss.\n",
    "- Forecasting and regression: mae, mse, rmse, mape, smape, msle, r2."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " For custom ```reward_metric```, please refer to [06_custom_reward_metric.ipynb](https://github.com/DataCanvasIO/HyperTS/blob/main/examples/06_custom_reward_metric.ipynb).\n",
    "\n",
    "------------------"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4. Specify optimization direction."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the model search phase, the searcher needs to specify the search direction, which will be detected from reward_metric by default. We can also specify via the parameter ```optimize_direction``` ('min' or 'max'):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = make_experiment(train_data, \n",
    "                             task='univariate-binaryclass',\n",
    "                             reward_metric='auc',\n",
    "                             optimize_direction='max',\n",
    "                             ...)  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 5. Set the maximum search trials."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default, the experiment searchs for 3 parameter models and stops searching. In practice, it is recommended to set the parameter ```max_trials``` to more than 30. If time is sufficient, a larger number of searches will have a higher chance of obtaining a better model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = make_experiment(train_data, \n",
    "                             max_trials=100,\n",
    "                             ...) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 6. Set early stoping."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When the ```max_trials``` setting is large, it may take more time to wait for the experiment to finish. In order to control the rhythm of the work, you can control it through the **Early Stopping** mechanism:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = make_experiment(train_data, \n",
    "                             max_trials=100,\n",
    "                             early_stopping_time_limit=3600 * 3,  # 3 hours\n",
    "                             ...)    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "where, ```make_experiment``` contains three early stop mechanisms, which can be used together. Details as follows:\n",
    "\n",
    "- early_stopping_time_limit: Limits the running time of the experiment, with a granularity of seconds.\n",
    "- early_stopping_round: Limit the number of search rounds for the experiment, with a granularity of times.\n",
    "- early_stopping_reward: Specify a limit for reward points.\n",
    "\n",
    "---------------"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 7. Specify the validation data set."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The experiment object requires not only the training data set, but also the evaluation data set. By default, a part of the evaluation data set will be divided from the training data set in a certain proportion. You can also specify the evaluation set through eval_data during ```make_experiment```, such as:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = make_experiment(train_data, \n",
    "                             eval_data=eval_data,\n",
    "                             ...) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Of course, we can also specify the size of the evaluation dataset by setting ```eval_size```:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = make_experiment(train_data, \n",
    "                             eval_size=0.3,\n",
    "                             ...) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 8. Specify search algorithm."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "HyperTS performs model selection and hyperparameter optimization through the built-in search algorithms in [Hypernets](https://github.com/DataCanvasIO/Hypernets), including EvolutionSearcher (default, 'evolution'), MCTSSearcher ('mcts'), and RandomSearch ('random'), etc. It can be specified by the parameter ```searcher```, specifying the class name of the search algorithm (class) or the name of the search algorithm (str):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = make_experiment(train_data, \n",
    "                             searcher='random',\n",
    "                             ...)  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hypernets.searchers import EvolutionSearcher\n",
    "\n",
    "search_space_general = ...\n",
    "\n",
    "experiment = make_experiment(train_data, \n",
    "                             searcher=EvolutionSearcher(search_space_general, population_size=500, sample_size=20, candidates_size=20),\n",
    "                             ...)  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For a detailed description of various search algorithms, please refer to [Search Algorithms](https://hypernets.readthedocs.io/en/latest/searchers.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 9. Specify time frequency."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In time series forecasting tasks, if we know the temporal frequency of the dataset, you can specify it precisely with the parameter ```freq```:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = make_experiment(train_data, \n",
    "                             task='forecast',\n",
    "                             timestamp='TimeStamp',\n",
    "                             freq='H',\n",
    "                             ...) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default, frequency will be inferred from ```timestamp```."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 10. Specify forecast window."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When using the deep learning mode for time series forecasting, we can specify the size of the sliding window through the parameter ```forecast_window``` after analyzing the actual situation of the data based on experience:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = make_experiment(train_data, \n",
    "                             task='forecast',\n",
    "                             mode='dl',\n",
    "                             timestamp='TimeStamp',\n",
    "                             dl_forecast_window=24*7,\n",
    "                             ...)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 11. Fixed random seed."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sometimes in order to ensure that the experimental results can be reproduced, we need to keep the same initialization. In this case, we can fix the random seed through the parameter ```random_state```:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = make_experiment(train_data, \n",
    "                             random_state=0,\n",
    "                             ...)  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 12. Adjusting a log level."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The progress messages during training can be printed by setting ```log_level``` (str or int). Please refer the logging package of python for further details. Besides, more comprehensive messages will be printed when setting ```verbose``` as 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = make_experiment(train_data, \n",
    "                             log_level='INFO', \n",
    "                             verbose=1,\n",
    "                             ...)  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 13. Discrete time series forecasting."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In some time series forecasting tasks, there may be no regular time frequency, i.e., discontinuous sampling. At this point, users can set ``mode='dl'`` and ``freq='null'`` to run experiment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = make_experiment(train_data,\n",
    "                            task='forecast',\n",
    "                            timestamp='TimeStamp',\n",
    "                            freq='null',\n",
    "                            ...)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 14. Forecasting without timestamp column."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For some time series forecasting data, there might be timestamp column, that is, only the target columns and covariates are contained. In this case, users could set ``timestamp='null'`` to run experiment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = make_experiment(train_data,\n",
    "                            task='forecast',\n",
    "                            timestamp='null',\n",
    "                            ...)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition, if the sampling frequency of data is known, it is recommeded to specify it by parameter ``freq``, which will facilitate data processing."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 15. Forecasting train data cut off."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the time series forecasting task, if the early too long historical data is involved in the training of the model, it may affect the final performance due to concept drift. ``forecast_train_data_periods`` can cut off the data for the specified period from the end of the training data forward."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = make_experiment(train_data,\n",
    "                            task='forecast',\n",
    "                            mode='stats',\n",
    "                            timestamp='TimeStamp',\n",
    "                            forecast_train_data_periods=24*10,\n",
    "                            ...)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 16. Set cross validation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To enhance the robustness of the model, users can specify whether to enable cross-validation through the parameter ``cv``. When ``cv`` is set to ``True``, it means that cross-validation is enabled, and the number of folds can be set by the parameter ``num_folds`` (default: 3)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = make_experiment(train_data,\n",
    "                            cv==True,\n",
    "                            num_folds=5,\n",
    "                            ...)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 17. Ensemble models."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In order to obtain better model performace, ``make_experiment`` can enable the model ensemble feature when creating an experiment, that is, specify the number of optimal models participating in the ensemble through the parameter ``ensemble_size``. When ``ensemble_size`` is set to ``None`` then model fusion is disabled (default)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = make_experiment(train_data,\n",
    "                            ensemble_size=10,\n",
    "                            max_trials=100,\n",
    "                            ...)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}